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Abstract 

The configuration model was originally defined for undirected networks and 
has recently been extended to directed networks. Many empirical networks are 
however neither undirected nor completely directed, but instead usually partially 
directed meaning that certain edges are directed and others are undirected. In the 
paper we define a configuration model for such networks where nodes have in-, 
out-, and undirected degrees that may he dependent. We prove conditions under 
which the resulting degree distributions converge to the intended degree distribu¬ 
tions. The new model is shown to better approximate several empirical networks 
compared to undirected and completely directed networks. 


1 Introduction 

Graphs appear in many current applications. In social sciences groups of people are 
often modeled by letting the vertices in the graph represent persons and edges represent 
the interactions or relationships between them. Edges can be directed or undirected, 
the later indicating a reciprocal relationship between the vertices. 

Usually the graphs created from such datasets are simplifications of the original 
dataset. One typical simplification is to allow only directed or only undirected edges. 
However, in real world graphs it is common to find a combination of directed and undi¬ 
rected edges. In [2] we find some examples of empirical graphs where the proportion 
of directed edges is in the range 0.26-0.85, the rest being undirected edges. Additional 
examples are shown in Table 1 where the proportion of directed edges has been cal¬ 
culated for some social networks that can be found in [7]. We expect such graphs to 
be better represented by partially directed graphs, where we allow both directed and 
undirected edges. 

The configuration model has been used extensively to model undirected networks 
[4, 3]. It has also been been adapted to work for directed graphs [1]. In the configu¬ 
ration model the graph is constructed by first assigning a degree to each vertex of the 
graph and then connecting the edges uniformly at random. The degrees of the vertices 
of the graph are either given as a degree sequence or the degrees are drawn from some 
given degree distribution. Graphs created in this way will share some properties with 
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Data set 

# vertices 

# edges Proportion directed 

soc-LiveJournall 

4 847 571 

42 851237 

0.402 

soc-Epinions 1 

75 879 

506 585 

0.996 

soc-Pokec 

1 632 803 

22301964 

0.627 

soc-Slashdot0922 

82168 

504230 

0.274 

email-EuAll 

265 214 

310006 

0.851 

wiki-Vote 

7115 

100762 

0.971 

wiki-Talk 

2 394 385 

4659 565 

0.922 


Table 1; Proportion of directed edges for some data sets from [7], when viewed as par¬ 
tially directed graphs. We see that several of these graphs have a substantial proportion 
of undirected edges and of directed edges, such that neither type should be ignored. 


real world graphs, but will be different in other aspects. E.g. the configuration model 
for directed networks will have a very low proportion of reciprocal edges, i.e. two 
parallel directed edges in opposite directions. This is an effect of connecting edges 
uniformly at random in this type of graph. This can be undesirable if we wish to use 
the configuration model graph as a null reference to compare with a real-world graph. 
While we wish to connect the edges uniformly at random, we may want to preserve the 
degree distribution, including any dependence between the indegrees, outdegrees and 
undirected degrees. 

In this paper we consider a partially directed configuration model where we allow 
both directed and undirected edges. Any vertex in such a partially directed configura¬ 
tion model graph can have all three types of edges: incoming, outgoing and undirected. 
We select the degree of each vertex from a given joint, three dimensional degree dis¬ 
tribution and we do not assume or require the in-, out- and undirected degrees to be 
independent. When connecting the edges, outgoing edges can only connect to incom¬ 
ing edges and undirected edges can only connect to undirected edges. Once all edges 
are connected we make the graph simple and thus do not allow self loops or parallel 
edges of any type. We make the graph simple by erasing conflicting edges and by con¬ 
verting parallel undirected edges in opposite directions into undirected edges. Since 
this process modifies the degree of some of the vertices, it is not certain that the empir¬ 
ical degree distribution converges to the degree distribution we started with. However, 
in Section 2 we show that, with suitable restrictions on the first moments of the degree 
distribution, the degree distribution asymptotically converges to the desired one. 

Note that, by selecting a joint degree distribution in the proper way we can also 
create completely directed graphs or completely undirected graphs, with or without any 
dependence between the degrees. Thus the presented partially directed configuration 
model incorporates several of the already existing models. 

In the next section. Section 2, we present definitions and state the main result of the 
paper. Detailed derivations and proofs have been postponed to Section 4. To illustrate 
how these graphs work. Section 3 is devoted to some simulations of partially directed 
graphs, showing results for small and for large n. The latter is to give an intuitive feeling 
for the asymptotic results and the former is to illustrate that significant deviations from 
these asymptotic results are possible for small n. A comparison with an empirical social 
network is also done. Conclusions and discussion can be found in Section 5. 
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2 Definitions and Results 


In this section we define the configuration model for partially directed graphs. We de¬ 
fine the terminology used, how the graph is created from a degree distribution, how the 
graph is made simple and finally show, with suitable restrictions on the first moments 
of the degree distribution, that the degree distribution of the partially directed configu¬ 
ration model graph asymptotically converges to the desired distribution. Proofs are left 
for Section 4 


2.1 Terminology 


A graph consists of vertices and of edges. The size of the graph, the number of ver¬ 
tices, is denoted n. Here we will specifically study the case when n —> oo. We work with 
graphs that are partially directed, meaning that any vertex can have incoming edges, 
outgoing edges and undirected edges. We distinguish between edges and stubs. By 
stubs we mean yet unconnected half-edges of a vertex. In the same way as edges, stubs 
can be in-stubs, out-stubs and undirected stubs. The number of stubs of the different 
types is the degree of a vertex and will be denoted d = {d^,d^,d^), where the in¬ 
dividual terms represent the indegree, outdegree and undirected degree, respectively. 
When the degree of the vertex is a random quantity, it is denoted D = (D^,D^,D^). 

A degree sequence that is non random is denoted d = ,d^,d^)}, 

r= where n is the number of vertices in the graph. When these degree se¬ 

quences are random vectors they are denoted D = {Dy) = 

Degrees can be assigned to the vertices from some given joint degree distribution 
with distribution function F for which the probability of a specific combination of 
indegree, outdegree and undirected degree is called = P{D={i,j,k)). We 

will also use the marginal distributions. We have pF = p,. = Y,jkPijk for the incoming 
edges, pj^ = pi,k = 'LikPijk for the outgoing edges and pf = p„k = 'LijPijk for the 
undirected edges. The corresponding random variables, i.e. the number of edges of 
each type, will be denoted and . 

Other quantities of interest are the moments of the distribution. Here we will con¬ 
sider the first moments = E[D^] = Y.ipT^ ~ '^JpT ~ 

E[D^] =Ikpf. 

A graph is simple if there are no unconnected stubs, no self-loops and no parallel 
edges. 

Eor a finite graph of size n we also want to count the number of vertices with a 
certain degree d. We call this quantity Dividing by n we can calculate /n, the 
proportion of the number of edges that have degree d. Whenever the graph is created 
by some random process, we can also consider the expectation of this random quantity 





, which defines the distribution function 


2.2 Defining the Model 

We define the partially directed configuration model as follows: 

1. We start with a graph with n vertices, but without any edges or stubs. 

2. Eor each vertex, we independently draw a degree from F at random. 

3. We connect undirected stubs with other undirected stubs. We do this by pick¬ 
ing two undirected stubs uniformly at random and connecting them. We repeat 
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this with the remaining unconnected undirected stubs until there is at most one 
undirected stub left. 

4. We connect directed incoming stubs with directed outgoing stubs. We do this 
by picking one directed incoming stub and one directed outgoing stub, both in¬ 
dependently and uniformly at random and then connecting them. We repeat this 
with the remaining unconnected directed stubs until we are out of incoming stubs 
or outgoing stubs (or both). 

5. We want the graph to be simple, but the connection process may have left some 
stubs unconnected and may also have created self-loops and parallel edges. We 
make the graph simple by erasing some stubs and edges. We dehne the procedure 
in such a way that the connectivity of the graph is maintained: 

(a) Erase all unconnected stubs. There can be at most one unconnected undi¬ 
rected stub, while there may be a larger number of unconnected directed 
stubs, either all incoming or all outgoing, if the number of in-stubs is not 
equal to the number of out-stubs. 

(b) Erase all self-loops, both directed and undirected. 

(c) When there are parallel identical edges, erase all except one of them. 

(d) Erase all directed edges that are parallel to an undirected edge. 

(e) Erase each pair of reciprocal directed edges and add a single undirected 
edge instead. While this step decreases the number of directed edges, it 
also increases the number of undirected edges. 

Erom the above description we see that there are two non-deterministic steps that 
affect the degrees of the vertices in the creation of the simple partially directed graph: 

1. Assigning degrees from the distribution F. 

2. Connecting the stubs uniformly at random. While this does not, in itself, modify 
the degrees of the vertices, it affects which stubs and edges that will be erased 
when making the graph simple. 

This process results in a finite graph for which the value of 7 had been closer to 2, 
then the average number of deleted stubs would not have decreased as clearly as it does 
now, indicating that the average number of deleted edges then decreases only slowly 
with the graph size. This still would not in itself contradict convergence in distribu¬ 
tion since a large proportion of the deleted edges can then be contributed to a small 
number of vertices of high degree, and so would not affect the overall convergence of 
the degree distribution.ch the degree distribution cannot be expected to be the same as 
F. However, we later show that, with suitable restrictions on the distribution F, the 
distribution that was dehned above, asymptotically approaches F. 

2.3 Asymptotic Convergence of the Degree Distribution 

The results in this section are inspired by, and to some degree follow [5]. The theorem 
establishes the asymptotic convergence of the degree distribution. 

Theorem 1 . If F has finite mean for each component, so and 

p*^ < 00 ^ and also p*^ = p^ then, as n ^ 00 
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a) 

b) /n A pj, that is, the empirical distribution converges in probability to F. 

The proof, which is postponed to Section 4, follows the same line of reasoning 
as in [5], but with modifications to take into account the complications introduced by 
allowing both directed and undirected edges in the graph. 

3 Examples of Partially Directed Graphs 

Although Theorem 1 establishes the asymptotic convergence of the degree distribution, 
it remains to see how how well this holds for finite graphs. In this section we investigate 
this by looking at a scale-free distribution, at a Poisson degree distribution and at an 
empirical network. Since we are working with a joint degree distribution, in addition 
to the distribution for each of the three stub types we also need to consider the possible 
dependence between the different types. Table 2 gives an overview of how the data for 
the plots were created. 

Since Theorem 1 focuses on showing convergence to the correct degree distribu- 
tion, studying the total variation distance, (defined in Section 3.1), is of interest 
(see e.g. [8]). We also study the number of erased edges as a function of the graph 
size. Finally, we study the size of the strongly connected giant component and the dis¬ 
tribution of small components for a few different graphs based on the empirical data 
from LiveJournal. The dataset LiveJournall [7] is a directed graph created from the 
declaration of friends in a social internet community. The original graph contains self 
loops, but these have been removed in this analysis. The simple graph has a proportion 
of directed edges of about 0.4, so this is a good example of a graph where both directed 
and undirected edges play an important role. When sampling from this distribution to 
create the configuration model graph, the degrees of vertices from the original (partially 
directed) graph were drawn independently and uniformly at random, with replacement. 
Thus the frequencies of the degrees found in the graph were used as the given distribu¬ 
tion F and this distribution function is then compared with the distribution F^"'^ created 
by sampling from F, connecting the edges and making the graph simple. 

3.1 Total Variation Distance 

Theorem 1 states that jn A pd and thus we define the following version of the total 
variation distance; 

4v = (1) 

d 

where the 1/2 is introduced so that dj^ can only take on values in the range [0,1]. 
As « —oo we expect to see that the total variation distance tends towards zero. When 

f/l") 

we generate the graphs according to the configuration model we replace A/ with the 

(n) ~ 

corresponding empirical sample m)i from one realization of a random graph. We can 
then repeat this process with more samples of random graphs and plot this. The result is 
shown in Figure 1, where we have also taken the average of the empirical total variation 
distance for 100 random graph samples. 

In Figure 1 we see that the total variation distance decreases towards zero. The 
fastest decrease is for the Poisson graph, and the reason is that this distribution has a 
light tail when compared with the scale-free distribution. A closer look att the empirical 
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Degree Method Independent Dependent 

distribution 


Empirical Empirical degree data from the dataset 
soc-LiveJournall [7] was used. This is data 
from an on-line social site. Some characteristics 
can be found in Table 1. The mean degrees for 
in-degrees, out-degrees and undirected degrees are 
3.6, 3.6 and 10.6, respectively (not shown). When 
viewed as a directed graph and counting all stubs 
this gives a total mean degree of approximately 
28.3. Here we count the undirected edges as 
two edges since it consists of an incoming and 
an outgoing edge, when viewed as an edge in a 
directed graph. Both the directed and the undirected 
edges have degree distributions that are approxi¬ 
mately scale-free in the tail, with /directed ~ 2.5 and 
rundirccted » 3.5 (not shown). 


Scale- 

free 


The selected distribution function is 


F(k) = 1 


d-(l'-l) 


with 

where f(g) is the Riemann zeta function. The tail 
of this distribution is asymptotically . This 

specific distribution function was selected because 
of its scale-free property, while still being easy to 
simulate from using a discrete variant of the inverse 
transformation method [9, see Section 11.2.1 and 
also Example 11.7]). For all simulations /= 2.5, 
which is the coefficient for the directed edges in the 
empirical graph. This value gives finite expectation 
(approximately 2.7), but infinite variance. This is 
consistent with the assumptions in Theorem 1. 


Poisson Degrees drawn from Poisson distribution with pa¬ 
rameter 7, thus having mean degree 7. When treated 
as a directed graph and counting all stubs the total 
mean degree is 28, close to the value 28.3 for the 
empirical graph above. 


Each stub type 
is treated in¬ 
dividually and 
independent 
samples are 
drawn, with 
replacement, for 
each vertex and 
each stub type. 


For each ver¬ 
tex and each 
stub type an 
independent 
sample from 
the assigned 
distribution was 
drawn. 


See above. 


Independent 
samples of com¬ 
plete vertices 
ai‘e drawn, with 
replacement, 
from the pool 
of empirical 
vertices. 


For each vertex 
an independent 
sample from 
the assigned 
distribution was 
drawn and the 
same degree was 
assigned to all 
stubs for the 
vertex. 


See above. 


Table 2; Explanation of how the graphs were created. 


graph reveals that the distributions for the directed and the undirected edges look much 
like a scale-free distribution. The in- and the out-degree have 2.5 and the undirected 
degree has jk, 3.5 in the tail (not shown). Thus the tail for the empirical distribution 
is heavier than for the Poisson distribution and so we can expect a slower convergence 
for the empirical graph. Even slower convergence has been observed (not shown) for 
values of / even closer to 2, e.g. try 7 = 2.1. This is not surprising as the distribution 
then becomes more heavy-tailed. If we continue even further, to 7 < 2 the conditions 
used in the proof of Theorem 1 no longer hold, since the expectations are no longer 
finite, and thus we should not expect the total variation distance to converge to zero for 
these values of 7 . 

Erom the figure we also see that the dependent curve for the Poisson distribution 
is clearly lower than the independent curve. One explanation for this is that when the 
degrees for in-stubs and the out-stubs are identical for each vertex, as in the dependent 
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Figure 1; The total variation distance versus graph size for three different degree distri¬ 
butions, with independent or dependent in-, out- and undirected degrees for the stubs. 
Each data point shows the average of 100 simulations. All curves decrease towards 
zero 


graph (as dehned in Table 2), the total number of in-stubs will be equal to the total 
number of out-stubs and thus no directed stubs will be deleted for this reason. There 
may still be self-loops and parallel edges, but for the Poisson graph these are few 
compared to the number of stubs deleted in the independent graph (as dehned in Table 
2) where there is a mismatch between the number of in-stubs and the number of out- 
stubs. For the empirical graph and for the scale-free graph the same phenomenon 
cannot be observed. One explanation to this is that the scale-free independent model 
is not dominated by the deletion of leftover directed edges. Instead the number of self¬ 
loops and parallel edges are of the same order of magnitude as the leftover directed 
edges (see Figure 2). Thus the difference between the curves for the total variation 
distance is much smaller for the scale-free and for the empirical graph. 

Another answer to why the empirical graph does not show a big difference between 
the dependent and the independent curve can be that the dependent version of the em¬ 
pirical graph does not have the same type of complete dependence as for the scale-free 
or the Poisson graph. In the empirical dependent graph, degrees are assigned by sam¬ 
pling the degrees of vertices from the original empirical graph, and thus the number of 
in-stubs will in general not equal the number of out-stubs. Looking at Figure 2 we see 
that the number of directed unconnected edges is almost the same for the independent 
version as for the dependent version of the empirical graph. Looking instead at the 
same plot for the Poisson graph we note that the deletion of directed unconnected stubs 
dominates the independent version of the graph, while there are no such deleted stubs 
in the dependent version of the graph. 
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3.2 The Average Number of Erased Edges per Vertex 

The number of erased edges will depend on the degree distribution, on the graph size 
and will also be different each time a graph is created according to the configuration 
model. In Figure 2 the average number of erased edges per vertex were plotted. Each 
point corresponds to the average of 100 simulations of random graphs according to 
the partially directed configuration model. The deleted edges were classified as to the 
reason why they were deleted as defined in the rules in Section 2.2. 

For all plots, the graphs indicate that the average number of deleted stubs or edges 
per vertex decreases with the size of the graph. Thus also the risk of any vertex having 
its degree affected by the deletion of a stub or an edge goes down and this indicates that 
the degree distribution converges to F asymptotically. The scale-free distribution 
is more difficult since for 7 < 2 neither the variance nor the expectation exist. Here we 
have selected 7 = 2.5 for the scale-free graph. This value gives finite expectation, but 
infinite variance. 

As already briefly mentioned in Section 3.1, for the scale-free and for the Poisson 
curves there are no deleted directed stubs for the dependent plots. This is because of 
how the dependent graphs are created. In these graphs, each vertex has the same num¬ 
ber of in-stubs and out-stubs. Thus there will not be any directed stubs left over after 
the graph has been connected so no such stubs will be deleted. For the empirical graph 
this is not the case since the dependent version of the graph is created by sampling 
from the empirical degrees of the vertices, and for these the number of in-stubs in gen¬ 
eral do not equal the number of out-stubs. In fact we note that the average number of 
deleted directed stubs per vertex seem to be approximately equal for the directed and 
the undirected version of the empirical graph, possibly indicating a quite poor cotTela- 
tion between in-stubs and out-stubs in the original graph. Another difference between 
the graphs is that for the scale-free dependent graph there are many more deleted di¬ 
rected reciprocal edges, deleted directed self loops and deleted directed edges that are 
parallel with undirected edge, compared with the independent scale-free graph. This 
can be explained by the heavy tail of the scale-free distribution. For instance assume 
that some vertex has a very high degree. Since the degrees are dependent (equal, in this 
case), the risk is much higher that there will be self loops among the directed edges. 
Also, since the undirected degree will also be high for this vertex, the risk of having 
directed edges in parallel with the undirected edges also increases. Finally the chance 
of getting reciprocal directed edges also increases. This risk is high if there are many 
vertices with high degrees. In the dependent case if two vertices have many in-stubs 
both will also have many out-stubs, increasing the chance of parallel edges between 
these. 


3.3 The Strongly Connected Components 

Finally we study the strongly connected components in the original data from Live- 
Journal, compared with the configuration model based on partially directed stubs and 
also on directed stubs. For any vertex i we define the out-component of vertex i as the 
set of all vertices that can be reached from vertex i by following the edges of the graph 
and respecting how they are directed. In the same way the in-component of vertex i 
is defined as the set of all vertices from which we can reach vertex i. The intersection 
of the out-component and the in-component defines the strongly connected component 
of vertex i. Any two vertices i, j where we can reach j from i and i from j have the 
same strongly connected component. Thus the graph can be uniquely divided into a set 
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(a) Empirical, independent in-, out- and undi- (b) Empirical, dependent in-, out- and undi¬ 
rected degrees rected degrees 




n 


n 


(c) Scale-free, independent in-, out- and undi- (d) Scale-free, dependent in-, out- and undi¬ 
rected degrees rected degrees 



(e) Poisson, independent in-, out- and undi- (f) Poisson, dependent in-, out- and undirected 
rected degrees degrees 


Figure 2: Number of erased edges divided with the number of vertices for the scale-free 
configuration model with parameter 7 = 2.5, for the Po(7) model and for the empirical 
configuration model. Each data point shows the average of 100 simulations. 
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Figure 3: The figure shows the proportion of all the vertices in the graph that belong 
to strongly connected components other than the largest component. Three plots are 
made: The first is for the original empirical graph, using the connectivity of the origi¬ 
nal dataset. The second one is for the configuration model for partially directed graphs 
with the same degree distribution as for the empirical partially directed graph. The 
third one is for the configuration model for directed graphs with the same degree distri¬ 
bution as the empirical graph, when viewed as a directed graph. The second and third 
graph are based on averages of 10 simulations - results are similar for each simulation. 
Note that the third plot consists of only a single point, since for this plot all small com¬ 
ponents only consist of a single vertex. The total number of vertices in the graph is 
4847 571. The relative size of the largest component (not shown in the plot) is 0.7898 
for the original graph, 0.8039 for the partially directed configuration model and 0.8026 
for the directed configuration model (the last two based on averages of 10 simulations, 
with the standard deviation being approximately 0.0002). 
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of strongly connected components. Here we study the strongly connected components 
of the empirical graph and also of configuration model graphs created by using the 
degree sequence of the empirical graph as the given degree distribution. The largest 
component in the graph corresponds to the notion of a giant component, the size of 
which is proportional to the size of the graph. The size of the giant component for 
these simulations can be compared with theoretical results for a configuration model 
graph with given degree distribution (see [6, page 5]). By plugging in the empirical 
degree distribution of the LiveJournal dataset, we get that the theoretical size of the 
giant component 0.8040 for the partially directed graph, and 0.8028 for the directed 
graph. These values show a good match with the simulation data presented in Figure 
3. 

It is not surprising that the largest component is largest in the configuration model 
for the partially directed graph. The original empirical graph is likely to have sub¬ 
communities that may connect only weakly to other communities, thus reducing the 
total size of the largest strongly connected component, but of course increasing the 
number of moderately sized strongly connected components. The directed graph lacks 
the undirected edges and thus the largest strongly connected component will not in¬ 
clude vertices that are connected to it only via a directed edge (in one direction only). 
Thus its largest strongly connected component will be smaller than for the partially 
directed graph. 

When looking at the variation in size among the medium sized components in Fig¬ 
ure 3, this is largest for the original empirical graph. For the configuration model on 
the directed graph all other components consist only of single vertices, while for the 
configuration model on the partially directed graph components of size 1-4 exist. The 
appearance of some larger small components for the partially directed graph is caused 
by the undirected edges, compared with only directed edges for the completely directed 
graph, as was already mentioned above. 


4 Proofs 


In this section we provide a proof of Theorem 1. The first part of the proof closely 
follows [5], with modifications for the joint distribution. In [5] the proof is for the undi¬ 
rected graph, and the addition of the directed edges makes things more complicated. 
There are mainly two things that need more detailed treatment, the 3-dimensional de¬ 
gree distribution and the fact that combining undirected and directed edges in the same 
graph creates new reasons for why edges are erased, affecting the empirical degree 
distribution and thus also, possibly, the asymptotic behavior of it. The first part of the 
proof, that is similar to [5] has been moved to two lemmas {Lemma 1 and Lemma 2) to 
make the part of the proof that is specific for the partially directed configuration model 
graph more accessible. A third lemma {Lemma 3) that helps in the final part of the 
proof of Theorem 1 has also been included. 


Lemma 1. /n A pd implies F as n ^ 

Proof. 


i) /n A Pd and 0 < /n < 1 imply E aK"^ /n 


[8, page 180]. 


► Pj, by bounded convergence 


ii) E 


a'"V« 


= implies p^"^ —)• p^VJ 
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iii) Since (ii) is valid for any d we have f F as n ^ oo. 


□ 


In Lemma 2 we need a few definitions that are used both in the lemma and in the 
proof of it. Let Mr ' be an indicator variable that shows if vertex r has had its degree 
modified in the process of creating a simple configuration model graph of size n. The 
total number of modified vertices can then be calculated by summing all of these and 
we define = L"=i ■ 

Lemma 2. Ifp(^Mr"'^=Q\Dr={d^,d^,d^)J —>• I'i d^ ,d^ ,d^ and for arbitrary r, 
then Ny /n —> p^as n ^ oo. 

Proof. 

i) Let be the number of vertices with degree ^ before any stub has been erased or 
added. By the law of large numbers we have that /n pd as Since 

we want to show that A pd it is enough to show that jn -^0 

as n —^ oo. 


ii) We note that modifying the degree of a vertex affects not only the number of ver¬ 
tices with the original degree, but also the number of vertices with the new degree, 

< 


(n) (n) 

thus can be less than . However, we can still be sure that 


We wish to show that A 0, i.e. that P 

n —>■ oo, Ve > 0. 

iii) Using Markov’s inequality and that > 0 we get 




> e 


0 as 




Thus it is enough to show that E 


>ej < 

M(”) /r 


e 

0 . 


;Ve > 0. 


( 2 ) 


iv) The are identically distributed since the numbering of the vertices is ar- 


(<>=i), 


bitrary and so E[Mf")/n] = E = P 

chosen arbitrarily. We want to show that P = 1 ^ 0 or, equivalently. 


= 1 I, where vertex 1 has been 

that 


1 as n - 


v) Conditioning on the degree of vertex 1 gives 


Y, P(M['‘^=Q\Di = {d^,d^,d^)'jP{Di={d^,d^,d^)) 




Since we know 


L PiDMd^,d^,d^))= Y Pd^d^d^ = l 

d^d^d^ d^d^d^ 

it is enough to show that 

p(M["'^=0\Di^{d^,d^,d^)'^ ^iyd^,d^,d^ as n ^ 


(3) 

(4) 

(5) 
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□ 


Lemma 3. Let {X^} be a sequence of non-negative random variables and let X be a 
non-negative random variable. Also let 0 < a < °° be a real number. 

IfXm ^ X as m ^ oo^ lim E[X,„] < a and E[X] = a, then lim E[Xm] = a. 

m—s-oo m—>00 

Proof. Eor non-negative random variables {Y„,} Eaton’s lemma states 


E 


lim inf y„, 

_ m^oo 


< lim inf E [7^1 • 

m^oo 


( 6 ) 


We apply Skorokhod’s representation theorem and can thus define {¥,„} and Y (all on 
the same probability space) to have the same distribution as {X^} and X, and Y 

as m —>■ 00 

Developing the left and right hand side of Eatou’s Lemma now gives: 


Thus 


LHS =E 


lim inf 

_ m^oo 


= E[y] = E[X]=a, 


RHS = liminfE[y„,] < lim E[ym] = lim E[Xm] < a. 


lim E\Xm\ = a 

m^oo 


(7) 

( 8 ) 

(9) 

□ 


Now we are ready to prove the main theorem. 
of Theorem 1. 

1. Lemma 1 shows that Theorem 1 (b) implies (a). 

2. It remains to prove Theorem 1 (b). Lemma 2 simplifies this process. 

(n) 

Let Mj be the indicator variable for the event that a specific vertex (arbitrarily 
selected to be vertex 1) has had its degree modified when creating a simple con- 
hguration model graph of size n according to the procedure dehned in Section 
2.2. Also let the degree of vertex \ ht — d— {d^,d^,d^). According to 
Lemma 2, in order to prove (b) it is sufficient to show that 

p(|m|'’'=0|Z2i=^] ^ 1 V^asn^oo (10) 

3. Remembering that we do not allow self loops or parallel edges, MJ = 0 exactly 
when each stub from vertex 1 is saved. In total, vertex 1 has d = d^-\-d^-\-d^ 
stubs and these are all saved only when all of them successfully attach to other 
matching stubs, all from different vertices selected from vertices {2,...,«}. In all 
other cases the degree of vertex 1 will surely be modihed, giving no contribution 

In) 

to the probability of MJ ’ =0. 

Now, if we knew the degrees of all the vertices, it would be easy to calculate the 

in) 

probability of MJ =0. We do this simply by considering all events where the 
stubs of vertex 1 connect to different vertices and then sum all the probabilities 
of these events. It is thus natural to continue the proof by conditioning on the 
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degrees of vertices {2,.Let the degrees of vertices {2,...,n} be = 
{D2, ...,Dn}, where the are i.i.d. from F. Then we want to study 


'(< 


«) 


=0\Di=d] =E 




( 11 ) 


4. We now look more closely at the conditional probability 

p(^M|"^=0|Di=(f,DW , (12) 


where D(") = d(") = {d 2 ,---,dj^} is a specific outcome of the degrees of the 
vertices. From this we see that the total number of stubs of each type are 

(n) n (n) n in) n 

^ = L d^. We want to know where each stub 

r=l r=l r=l 

of vertex 1 attempts to connect and define a set set of indices, i = {ii, 
j = and k = {ki, ...,kd^}. Any set of values of these indices we 

call a save-attempt, indicating that we try to save all stubs of vertex 1 from being 
erased, by attempting to connect the stubs of vertex 1 to matching stubs from the 
vertices pointed to by these indices. 

Given the degrees of all vertices we can calculate the probability of any such 
save-attempt. First some basic observations: 

(a) If any one of the selected vertices do not have a matching stub the proba¬ 
bility of the save-attempt is zero. As an example, assume that an in-stub 
attempts to connect to vertex 2, but vertex 2 does not have any out-stub at 
all. Then this event will have probability zero. 

(b) As a consequence, for the save-attempt to have a probability larger than 
zero, all the vertices that the stubs of vertex 1 attempt to connect to must 
have matching stubs. 

As an example, take a look at the save-attempt where each stub of vertex 1 tries 
to connect to the other vertices in order. The indices then take on the values {/i = 
2,12 = 3, ...,kd^-i = d,kd*^ = d-\-\}. For now, we ignore the probability that 
there may not be enough matching stubs of vertices {2, ...,n} to accommodate 
all the stubs of vertex 1. We do this now to make the main argument clearer, but 
we correct the equations for this special case later in the proof. 

First we look at in-stub 1 from vertex 1. Since we are working with the configu¬ 
ration model, this stub has an equal chance of connecting to any of the matching 
stubs. Thus the probability of in-stub 1 from vertex 1 to connect to any of the 
out-stubs from vertex 2 is 

d^ 


Once in-stub 1 of vertex 1 has connected to vertex 2 we continue with in-stub 
2 of vertex 1. Once again the configuration model tells us that this stub has an 
equal chance of connecting to any of the remaining matching stubs. Thus the 
probability of it connecting to any of the out-stubs from vertex 3 is 



( 14 ) 
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We can continue in the same way with the rest of the in-stubs, then the out-stubs 
and finally the undirected stubs of vertex 1. For the undirected stubs we note that 
we need to subtract 2 stubs every time we connect one stub, since the undirected 
stubs connect to other undirected stubs. 


Now we can calculate the probability of this specific save-attempt and find that 
it is 


d‘^ 


n 





n 





n 



(15) 


In the expression we have ignored that we we have already used up out- 
stubs when connecting the in-stubs of vertex 1. We correct for this in the final 
expressions given later in the proof. 

Here we explicitly see that this expression is equal to zero iff any one of the 
degrees in the numerator is zero. Otherwise it will be positive, but always less 
than or equal to 1. 

W 

To shorten the expressions we will call each of the three parts of Eq. (15) q^, 

W W 

and q^, respectively, where the arrow indicates what type of stub in vertex 
1 we are dealing with. 

Now we are ready to write down the expression for the conditional probability 
in Eq. (12) We need to sum Eq. (15) over all values of i, j and k, such that all 
sub-indices are different - pointing to different vertices. We arrive at 


P =01 D 1 =^, ) 


E 

i.jk 


'tin n 

■ 




all sub-indices different 


(16) 


The number of terms in the sum will be [n — l)(n — 2) •... • (n — d), which is 
simply the number of different ways in which we can select the d indices out of 
the n — 1 possible vertices. Note that these combinations of indices include the 
ones we are interested in, where all stubs of vertex 1 are saved. Note also that 
the sum includes some combinations that we are not interested in, but all of these 
have probability zero and so it does not matter if we include them in the sum or 
not. 


5. We now need to deal with a few complications that will lead to corrections to 

(n) W W 

and q^. 


(a) If the number of stubs of vertex 1 (d) is larger than the number of available 
nodes (n — 1), then it is not possible to select all sub-indices indices dif¬ 
ferent. However, since d is fixed, this is always resolved as n —> oo. In the 
following we will always assume that n>d. 

(b) There may be a mismatch in the number of stubs. If the number of undi¬ 
rected stubs is odd, there will be one extra stub. Let be the number of 
such stubs. Clearly can only be 0 or 1. 

In the same way the number of in-stubs may differ from the number of out- 

( \ W W 

stubs. Let = s —s , the difference between the number of in-stubs 
and the number of out-stubs. Clearly can be negative, zero or positive. 
If v("^ or are not zero then some stubs will remain unconnected. 
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In the following we will deal with both of these by imagining two extra 
pools of edges each of size and |, respectively. These pools behave 
just as any normal vertex and any stub has an equal probability to connect 
to any allowed stub, including these two pools. They are thus added to the 
denominators in Eq. (15) 

(c) As mentioned before, we have included some events that have probability 
zero in the sum. Although the nominator is always zero for these, in some 
special cases the denominator may also become zero. This happens when 
there are not enough matching stubs to accommodate all the stubs of vertex 
1. Of course we could define 0/0 := 0), but here we instead chose to correct 
the denominator so that it does not become zero. We do this correction 
by adding an extra indicator variable to the denominator. Whenever this 
happens, the nominator is still zero, so the sum is not changed. 

MM M 

The corrected versions of q^ and q^ are thus 


= 


= 




rii;^- - - 


(17) 


Ui,, -^^ 

'■=1 s^-d^-r+l - vvWl r („)<o| + {d^+d^)l r („, 

- (19) 


) 


'-=1 i ^- 2 r + l + vW + 2£/^1 




6 . To be able to obtain an expression for the probability in Eq. (12) we need to 
replace the degrees in Eq. (19) with their stochastic counterpart to obtain 


p(M|")=0|Di=rf,DW) = 


i.j.k 

all sub-indices different 


(«) («) («) 

QiQiQk, 


where 


M 


Qi = 


Q? = n 


d 

n 


Dr 


'■ 1 5^-r+l 


d' 


D1 


I S^^-d^-r+l- W(«) 11^(„)+ (d^ +d ^) 1j 


( 20 ) 


( 21 ) 


/ 22 ) 


Gk 


d^ 

n 




=1 s'^-lr+l +y(") +2^/^l 


(23) 




Here the uppercase variables are all the stochastic counterparts of the lowercase 
variables defined previously. 
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7. Now we can continue with Eq. (11): 




= E 


i.j,k 

.all sub-indices different 


(«) («) (n) 

QVQ^Qk 


i.j.k 

all sub-indices different 


M y f«) 

erCj Gk 


= (n — l)(n — 2) •• (n — d)E 


(n) (n) 

<2r<2j"<2k 




{n — 1) ■■ {n — d) 


(n) (n) (n) 


= E">E 




er I («“ er 


'4 






(24) 

(25) 

(26) 

(27) 

(28) 
(29) 


Note 7.- The expectation and the summation can be interchanged since all terms 
are non-negative and since the summation does not depend on any random quan¬ 
tity (as mentioned before). 

Note 2: Since vertex degrees are drawn independently at random, all expectation 
terms in the sum are identical and we simply take the number of terms times the 
expectation of one of the terms instead of the sum. The number of terms was 
already discussed above. 

Note 3: c(") = 

rr 

8 . All that remains is to take the limit of Eq. (29). We start by studying the limit of 
what is inside the expectation. Rewriting the first term we get 

fo) 

^- (30) 






{wW>o} 


+ d^l 




DJ 


d*- 

'■=1 rj-i _l_ wMn I -n 

n n -^{wW>0} n 


(31) 




The remaining outgoing and undirected terms will be very similar, producing the 
additional terms 5^/n, /n, jn, d^ jn and d^/n in the denominator. 


Now note that, since P 


lim E 

n^oo 


(^M['‘^=Q\Di=d^ < 1 and lim c("( = 1, we have that 




Qf gj Gk 




< 1 


By the law of large numbers and using Slutsky’s Theorem 


I («" er) («" 




D /nCiAv 




d^ 




(32) 


(33) 
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(n) y 

Here we used that lim ^— = lim -——— = — \X^ = 0, by assumption. 

n^oo ^ Tt^oo ^ 

Since all D^, Dj^ and are independent by construction we also have 


iK=iD, 




(M- 


\d^ 




d^ 


'= 1 




= 1 . 


( 34 ) 


Now we use Lemma 3 and immediately conclude that 


lim E 

n^oo 


Qt ) I n‘'"ef) 


= 1 


(35) 


and thus also that 


limP 

n—>-oo 


fMj”^=0lD.=d) = limc(”^E 

\ ^ / H—>oo 




er) (I («“ er 






This is what we wanted to show and so the proof is complete. 


= 1 . 
(36) 
□ 


5 Conclusions and Discussion 

We have shown a simple way to create a partially directed configuration model graph 
from a given joint degree distribution. The graph is simple, and under specified con¬ 
ditions the degree distribution converges to the desired one. The proof is generic and 
can be extended to any type of graph where stubs are saved from being erased if they 
connect to other (unique) vertices. The only assumptions in the proof are that the de¬ 
grees of different vertices are independent, that the expectation of the degree of each 
type of stub is finite and that the expectation of the degree for the in-stubs is equal to 
the expectation for the degree of the out-stubs. This means that the proof works also 
for undirected graphs and for directed graphs, and also if the number of different types 
of stubs is increased to any finite number, as long as similar conditions as in this proof 
are fulfilled. Allowing for self loops and parallel edges only increases the chance of 
saving a stub from being erased and so is not a problem. 

The main advantage of using a partially directed model to represent empirical net¬ 
works, as opposed to using a completely directed or completely undirected model, is 
that the partially directed model preserves the proportion of undirected edges. This 
is important for networks where there is a significant proportion both of directed and 
of undirected edges, and where none of the different types of edges can be ignored. 
Examples of such graphs have been given in Table 1. The model also preserves any 
dependence between directed and undirected degrees present in the original empirical 
graph or the given degree distribution. 

However, this model does not produce other structures that can often be found in 
empirical networks. E.g. it does not produce the same number of moderately sized 
strongly connected components that we see in the empirical networks. In this respect it 
does however perform slightly better than the configuration model on directed graphs. 
Possible improvements towards realism would be to see how e.g. triangles (of different 
types), different types of vertices and other heterogeneities could be included in the 
model. 
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